feat(codec): PR-X12 A2 mode bit-pack + A3-intra prediction kernel#195
Conversation
A2 — mode.rs (~270 lines, 11 tests):
- 16-bit header: 2-bit mode (Skip/Merge/Delta/Escape) + 12-bit basin_idx
- MergeDir 2-bit pack/unpack (high bits masked)
- Whole-leaf compact pack/unpack: Skip=2B, Merge=3B, Delta=3B, Escape=6B
- packed_byte_len() const fn for buffer pre-sizing
- MAX_BASIN_IDX (4095) + BASIN_NONE sentinel
- Stream roundtrip test for mixed-mode leaves
A3-intra — predict.rs (~330 lines, 12 tests):
- IntraContext { basin_idx, delta_i32, NESW neighbours }
- IntraConfig { escape_next_idx: Option<u32> }
- predict_intra() decision tree: Skip → Merge → Delta → Escape
(monotone wire cost 2 → 3 → 3 → 6 bytes; cheapest-fit policy)
- Merge match: same basin_idx + same δ as u8 (sign-tolerant wrapping cast)
- Escape fallback: lossy i8 clamp when allocator absent (never panics)
- End-to-end pack/unpack chain test through the decision
Deferred to follow-up: A3-inter (cross-tier neighbour scan from
BlockedGrid L2/L3), A4 transform, A6 RDO, A7 rANS, A8 stream framing.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThis PR adds two new modules: ChangesCodec Leaf Serialization and Intra Prediction
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/hpc/codec/mode.rs`:
- Around line 58-64: The public constants and functions (MAX_BASIN_IDX,
BASIN_NONE, unpack_header, pack_merge_dir, unpack_merge_dir, unpack_leaf,
packed_byte_len and the other public items referenced around lines 84-89,
105-116, 182-221) are missing doctest examples; add short /// ```rust examples
that demonstrate basic usage and assertions so they compile as doctests (or use
no_run when appropriate). For each symbol, add one-line examples showing
input->output (e.g., assert_eq!(MAX_BASIN_IDX, (1<<12)-1);
assert_eq!(BASIN_NONE, MAX_BASIN_IDX); call unpack_header with a sample packed
header and assert fields; call pack_merge_dir/unpack_merge_dir as a roundtrip
and assert equality; call unpack_leaf and packed_byte_len with a minimal valid
buffer and assert expected lengths). Ensure examples import any types needed and
are small and self-contained so rustdoc tests pass.
- Around line 153-156: pack_leaf currently rejects buffers smaller than a
hardcoded 6 bytes which ignores mode-specific sizes and causes valid
Skip/Merge/Delta buffers to be rejected; change the length check to compute the
required size using the mode-aware helper (e.g., call packed_byte_len(leaf.mode)
or the equivalent function already used elsewhere) and compare out.len() against
that required usize before returning None, ensuring packed_byte_len is in scope
and used to determine the minimum buffer length for LeafCu packing.
In `@src/hpc/codec/predict.rs`:
- Around line 67-118: Add runnable doc-tests for the public API items: provide
concise /// examples demonstrating basic usage for IntraContext, IntraConfig,
and the is_no_basin helper so they satisfy public-API docs. Specifically, add a
small doctest showing how to construct an IntraContext (populate basin_idx,
delta_i32, neighbours using Option<&LeafCu> stubs or dummy values), an example
for IntraConfig showing both Some(escape_next_idx) and None behavior, and an
example call to is_no_basin illustrating its return for a boundary/empty basin;
ensure examples compile by importing any types used or using minimal mock values
and mark them as code blocks in the doc comments for each symbol (IntraContext,
IntraConfig, is_no_basin).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 4bbaaa0b-b86e-4568-854e-a5c95e160f2e
📒 Files selected for processing (3)
src/hpc/codec/mod.rssrc/hpc/codec/mode.rssrc/hpc/codec/predict.rs
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 26d987f9db
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| // Multiple matches all collapse to the same coded leaf, so the | ||
| // first-hit policy is order-deterministic without affecting | ||
| // bitstream length. | ||
| let our_delta_u8 = ctx.delta_i32 as u8; // wrapping cast matches A2 pack |
There was a problem hiding this comment.
Prevent overflow deltas from taking Merge path
predict_intra computes our_delta_u8 with a wrapping cast and checks Merge candidates before verifying that delta_i32 fits in i8, so out-of-range deltas can incorrectly return Merge instead of Escape. For example, delta_i32 = 200 wraps to 0xC8 and can match a neighbor byte 0xC8 (which represents -56 in i8 space), producing a leaf that cannot reconstruct the original delta and silently corrupts decoded values.
Useful? React with 👍 / 👎.
| if out.len() < 6 { | ||
| return None; |
There was a problem hiding this comment.
Allow packing into mode-sized output buffers
pack_leaf currently returns None unless out.len() >= 6, even when the leaf mode only needs 2 bytes (Skip) or 3 bytes (Merge/Delta). This breaks valid callers that pre-size streams using packed_byte_len and then pass exact remaining slices, especially near the end of a buffer, causing spurious packing failures for otherwise encodable data.
Useful? React with 👍 / 👎.
AdaWorldAPI
left a comment
There was a problem hiding this comment.
Read base (ctu.rs) + both new files end-to-end. The shape is solid — clean module boundary, 45 tests, the decision tree is the right call (Skip → Merge → Delta → Escape ordered by wire cost), and A1's P1 codex fix (full LeafCu equality on merge) carries through cleanly. Two correctness / API issues are worth holding the merge on, plus a few P2s.
CI snapshot at this commit: format, clippy/1.95, MSRV, AVX-512 check, hpc-stream-parallel, no-std all green; tests/{stable,beta,1.95.0} + native-backend/stable were still in-progress when I started.
Posting as
COMMENTbecause GitHub blocksREQUEST_CHANGESon PRs authored under the same login — treat the P0 below as a request-changes in spirit.
🔴 P0 — Merge scan can fire incorrectly for out-of-i8-range δ (predict.rs:153)
let our_delta_u8 = ctx.delta_i32 as u8; // wrapping cast
for (i, nb_slot) in ctx.neighbours.iter().enumerate() {
...
if nb.delta != Some(our_delta_u8) { continue; }
return LeafCu::merge(ctx.basin_idx, dir);
}The wrapping cast collapses mod 256. A neighbor's stored δ_u8 was always written by a Delta-mode encode, so it represents an i8 in [-128, 127]. But the current cell's δ_i32 is allowed to be any i32 — if |δ| > 127, the wrapping cast can alias a neighbor's u8 even though the underlying i32 values differ by 256+. Concrete:
our_delta_i32 = 200 → our_delta_u8 = 0xC8 (200)
nb stored Delta(0xC8) ← came from δ_i32 = -56 (the only i8 that fits)
Merge fires (alias match).
Decode reconstructs basin + (-56), original was basin + 200.
→ silent data loss of 256 units.
The PR body itself says "verify the Merge sign-tolerance contract matches the eventual A7 rANS decoder's u8 read" — sign-tolerance is correct inside the i8 window. Outside it, the wrapping cast aliases and the protocol breaks.
Fix is one line — gate the scan on |δ| ≤ 127:
if (-128..=127).contains(&ctx.delta_i32) {
let our_delta_u8 = ctx.delta_i32 as u8;
for (i, nb_slot) in ctx.neighbours.iter().enumerate() { ... }
}(Or skip Merge entirely when the i8 guard fails, with a comment naming the alias.) No test catches this today — merge_negative_delta_via_wrapping_cast uses δ = -17 (in range). Add a regression test where our δ_i32 = 200 and a neighbor has Delta(_, 0xC8), assert the produced leaf is not Merge.
Caveat for the author: if the basin-nearest-neighbour search guarantees
|δ| ≤ 127for all valid inputs, this is theoretical and theescape_next_idxallocator path is dead code too. Thei32field width and the explicit Escape branch say otherwise — please confirm against the design doc.
🟡 P1 — IntraConfig::escape_next_idx: Option<u32> is a static field, not an allocator (predict.rs:107)
Doc:
Stateless API today: encoder calls
escape_next_idxonce per Escape decision.
But the impl just reads cfg.escape_next_idx — every cell in a batch with Escape mode emits the same idx. A caller doing:
for cell in cells {
let leaf = predict_intra(&ctx, &cfg); // cfg.escape_next_idx = Some(N)
if leaf.mode == Escape { escape_vec.push(cell_val); }
}gets multiple Escape leaves all pointing at idx N → escape-vector collision on decode.
Pick one of:
- (a) accept
&mut dyn FnMut() -> u32so each Escape call advances, - (b) rename the field to
escape_idx_to_assign(or similar) so the constraint that the caller must rebuildIntraConfigper Escape is loud, - (c) return the next idx alongside the leaf so the caller knows when to bump.
Test coverage gap mirrors the bug: escape_when_delta_overflows_i8_and_allocator_present only exercises one Escape. Add a two-cell test that runs predict twice and asserts the second Escape sees a different idx.
🟡 P1 — NESW / NEWS doc-vs-discriminant mismatch (predict.rs:79)
/// `neighbours`: NESW (in [`MergeDir`] discriminant order) ...
But MergeDir in ctu.rs:81-89 is North=0, East=1, West=2, South=3 — that's NEWS, not NESW. merge_dir_from_index in predict.rs:191-197 faithfully reflects the discriminant order (slot 2 → West, slot 3 → South), so a caller who reads the docstring literally and supplies neighbours as [N, E, S, W] will get every south/west merge labeled with the wrong direction.
Two ways out: (a) fix the doc to say NEWS (one-character edit), or (b) reorder MergeDir's discriminants — but that breaks the on-wire 2-bit codes pinned by cell_mode_discriminants_match_wire_codes / merge_dir_discriminants_match_wire_codes in ctu.rs and would need a consumer audit. (a) is the right move. While there, drop an explicit table in IntraContext and merge_dir_from_index:
slot 0 → MergeDir::North (discr 0)
slot 1 → MergeDir::East (discr 1)
slot 2 → MergeDir::West (discr 2)
slot 3 → MergeDir::South (discr 3)
No test exercises slot 2 or slot 3 specifically (the closest, merge_picks_first_hit_in_nesw_order, only checks slots 0 and 1). Add at least one slot-3 = South case.
P2 — header ASCII diagram contradicts implementation (mode.rs:18-26)
The diagram puts M0/M1 at the MSB end (bits 15/14):
MSB LSB
┌──┬──┬──────────────────────────────┐
│M0│M1│ basin_idx (12) │
But pack_header is (mode_bits & 0b11) << 12 | (basin & 0xFFF) — the mode lands at bits 12-13, not 14-15. The narrative one paragraph down ("remaining 2 bits at the top of the second byte are reserved for the encoder's future merge_dir overlap") is consistent with the impl, just contradicts the diagram. Roundtrip tests pass because pack/unpack are symmetric. Re-draw the diagram, or swap to << 14 if you actually wanted the top of the u16 (cleaner for future extension).
P2 — pack_leaf requires worst-case 6-byte buffer for every mode (mode.rs:142)
unpack_leaf does per-mode bounds checks (2 / 3 / 3 / 6). pack_leaf bails at out.len() < 6 regardless of mode. Documented, but it breaks the streaming pattern: a buffer pre-sized to sum(packed_byte_len(mode_for_cell)) (the natural shape, since packed_byte_len exists exactly for that) runs out of "headroom" before the last cell can pack a Skip into the remaining 2 bytes. The streaming test happens to dodge this because it sizes at strict worst-case 8 × 6.
Fix:
let needed = packed_byte_len(leaf.mode);
if out.len() < needed { return None; }Symmetric to what unpack_leaf already does.
P2 — lossy Escape fallback emits CellMode::Delta (predict.rs:179-186)
Re: the PR body's open question #2 — yes, this is worth a design call. Current behavior: encoder silently clamps δ to i8 range and emits Delta(127) (or -128). Valid bytestream, lossy reconstruction.
- If the basin lookup actually guarantees
|δ| ≤ 127, the wholeescape_next_idx/ lossy-fallback branch is dead — drop it, returnLeafCudirectly (and P0 above also goes away). - If
|δ|can genuinely exceed 127, the lossy clamp is a footgun in disguise —Result<LeafCu, EncodeError::EscapeAllocatorMissing>is the honest signature, and callers can opt into a silent-clamp wrapper if they want it.
The "never panics, lossy" stance is fine for a v1 prototype but at minimum the docstring should call out the lie of CellMode::Delta covering a value that overflows i8.
Nits
merge_skipped_when_neighbour_mode_is_not_deltaputs the Escape neighbor at slot 3 but only asserts the result mode — doesn't probe whether slot 3 truly maps to South. Slot-order coverage gap (same as P1 #3).- A2 doc table says "MergeDir 2-bit" tail but
pack_merge_dirwrites a whole byte; clarify "1 byte (low 2 bits used; high 6 reserved)". is_no_basinis a one-line eq check on apub const; consumers can writebasin == BASIN_NONEthemselves. Keep or drop, no strong view — but if you keep it, a doc-test demonstrating theBASIN_NONEsentinel use case would justify the helper.
Suggested action
Block on P0 + P1 (escape allocator) + P1 (NEWS doc) before merge. P2s can land as a follow-up commit on the same branch or a fresh PR. A3-inter / A4-A8 deferred work is correctly scoped out.
Generated by Claude Code
… pack_leaf
P1 (codex) — overflow δ no longer aliases to Merge:
predict_intra previously took `our_delta_u8 = δ as u8` BEFORE
checking i8 fit, so e.g. δ=200 wrapped to 0xC8 and could match a
neighbour byte 0xC8 (i8=-56), silently corrupting reconstruction.
Now the i8 range check gates both the Merge scan and the Delta
branch; out-of-range δ falls straight through to Escape (or the
documented lossy clamp).
+ 2 regression tests:
- overflow_delta_does_not_alias_to_merge
- overflow_delta_with_allocator_takes_escape
P2 (coderabbit + codex) — pack_leaf accepts mode-sized buffers:
pack_leaf used a 6-byte minimum for all modes; callers pre-sizing
by packed_byte_len() got spurious None for Skip(2)/Merge,Delta(3).
Length check now gates on packed_byte_len(leaf.mode).
+ 1 regression test: pack_leaf_accepts_mode_sized_buffers
P3 (coderabbit) — doctest examples on the public-API surface:
Added /// runnable examples to MAX_BASIN_IDX, BASIN_NONE,
unpack_header, pack_merge_dir, unpack_merge_dir, unpack_leaf,
packed_byte_len, IntraContext, IntraConfig, is_no_basin.
Removed unused LeafCu import from existing predict_intra doctest.
Gates:
cargo test --features codec --lib hpc::codec → 48 passed
cargo test --features codec --doc hpc::codec → 14 passed
cargo fmt --all -- --check → clean
cargo clippy --features codec --lib -- -D warnings → clean
…doc + diagram
P1 — escape allocator collision in batched encoding:
Old: cfg.escape_next_idx: Option<u32> was a static field — every
Escape leaf in a batch got the same idx, colliding on the escape
vector slot at decode time.
New: predict_intra now takes a third arg `escape_next: Option<&mut u32>`
that the kernel post-increments when Escape fires. Sequential cells
in the same batch see fresh, non-colliding idxs.
IntraConfig becomes empty (reserved for future RDO knobs).
+ 1 regression test: escape_allocator_advances_across_batched_calls
+ escape_when_delta_overflows_i8_and_allocator_present extended
to assert the cursor advances.
P1 — NESW vs NEWS doc/code mismatch:
MergeDir discriminants are North=0, East=1, West=2, South=3 (NEWS),
but the doc said "NESW". Fixed in IntraContext docstring + the
inline Merge-scan comment. Added explicit slot-to-direction table.
+ 1 regression test: merge_slot_2_maps_to_west_and_slot_3_to_south
P2 — ASCII header diagram in mode.rs:
Old diagram put M0/M1 at MSB bits 15-14, but pack_header uses
`(mode << 12) | basin`, placing mode at bits 12-13 with bits 14-15
reserved at zero. Redrew the diagram with explicit bit indices
and labelled the reserved high bits for future encoder use.
Nits:
- A2 doc table now says "Merge tail: 1 byte (low 2 bits = MergeDir;
high 6 reserved)" instead of the misleading "MergeDir 2-bit".
- Renamed merge_picks_first_hit_in_nesw_order → ..._in_news_order to
match the corrected ordering.
Already-addressed (no-op this commit):
- P0 overflow Merge alias → fixed in b39a576
- P2 pack_leaf 6-byte minimum → fixed in b39a576
Gates:
cargo test --features codec --lib hpc::codec → 50 passed (+2 new)
cargo test --features codec --doc hpc::codec → 15 passed (+1 new)
cargo fmt --all -- --check → clean
cargo clippy --features codec --lib -- -D warnings → clean
|
Thanks for the thorough review. Per-finding status after P0 — overflow Merge alias (
|
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/hpc/codec/mode.rs (2)
194-210:⚠️ Potential issue | 🟠 Major | ⚡ Quick winReject malformed
LeafCus instead of silently rewriting them.The
unwrap_or(...)fallbacks makepack_leafnon-bijective: an invalidMerge/Delta/Escapeleaf gets serialized as a different valid leaf (North,0, or0). That hides upstream invariant breaks and can corrupt the stream while still returningSome(...).Suggested fix
let tail_len = match leaf.mode { CellMode::Skip => 0, CellMode::Merge => { - // Caller guarantees `merge_dir.is_some()` for `Merge` mode - // (LeafCu::merge constructor enforces this). Fall back to - // North if the invariant is violated, to keep encoder - // robustness — the decoder will still produce a valid leaf. - out[2] = pack_merge_dir(leaf.merge_dir.unwrap_or(MergeDir::North)); + out[2] = pack_merge_dir(leaf.merge_dir?); 1 } CellMode::Delta => { - out[2] = leaf.delta.unwrap_or(0); + out[2] = leaf.delta?; 1 } CellMode::Escape => { - let idx = leaf.escape_idx.unwrap_or(0); + let idx = leaf.escape_idx?; out[2..6].copy_from_slice(&idx.to_le_bytes()); 4 } };🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/hpc/codec/mode.rs` around lines 194 - 210, pack_leaf currently silently substitutes defaults for missing fields (using unwrap_or) for CellMode::Merge/Delta/Escape, making serialization non-bijective; change pack_leaf (the match over leaf.mode) to validate required fields instead of falling back: for CellMode::Merge require leaf.merge_dir.is_some() and return None (or propagate error) if missing, using pack_merge_dir(leaf.merge_dir.unwrap()) when present; for CellMode::Delta require leaf.delta.is_some() and use leaf.delta.unwrap(); for CellMode::Escape require leaf.escape_idx.is_some() and copy from escape_idx.unwrap().to_le_bytes(); remove the unwrap_or fallbacks so malformed LeafCu values are rejected rather than rewritten.
69-78:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift
BASIN_NONEcurrently collides with a real 12-bit basin value.
MAX_BASIN_IDXis treated as encodable data, butBASIN_NONEis set to that same value. That makes basin4095ambiguous: it can round-trip throughpack_header, whileis_no_basin(4095)also treats it as “no basin”. This leaks a cross-layer contract bug into callers and prediction logic for the highest real basin index. Either reserve4095and tighten the valid range/docs/tests accordingly, or move the sentinel out of band.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/hpc/codec/mode.rs` around lines 69 - 78, BASIN_NONE currently equals MAX_BASIN_IDX, making the highest encodable basin (4095) ambiguous; change the encoding so 4095 is a reserved sentinel by reducing the valid maximum basin value and keeping BASIN_NONE==4095: set MAX_BASIN_IDX to (1<<12)-2 (4094) and keep BASIN_NONE as (1<<12)-1 (4095), then update any code and tests that validate basin ranges — specifically adjust checks in is_no_basin, pack_header, and any asserts/examples that compared BASIN_NONE to MAX_BASIN_IDX or assumed 4095 was a valid encodable basin. Ensure documentation/comments reflect the tightened valid range.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@src/hpc/codec/mode.rs`:
- Around line 194-210: pack_leaf currently silently substitutes defaults for
missing fields (using unwrap_or) for CellMode::Merge/Delta/Escape, making
serialization non-bijective; change pack_leaf (the match over leaf.mode) to
validate required fields instead of falling back: for CellMode::Merge require
leaf.merge_dir.is_some() and return None (or propagate error) if missing, using
pack_merge_dir(leaf.merge_dir.unwrap()) when present; for CellMode::Delta
require leaf.delta.is_some() and use leaf.delta.unwrap(); for CellMode::Escape
require leaf.escape_idx.is_some() and copy from
escape_idx.unwrap().to_le_bytes(); remove the unwrap_or fallbacks so malformed
LeafCu values are rejected rather than rewritten.
- Around line 69-78: BASIN_NONE currently equals MAX_BASIN_IDX, making the
highest encodable basin (4095) ambiguous; change the encoding so 4095 is a
reserved sentinel by reducing the valid maximum basin value and keeping
BASIN_NONE==4095: set MAX_BASIN_IDX to (1<<12)-2 (4094) and keep BASIN_NONE as
(1<<12)-1 (4095), then update any code and tests that validate basin ranges —
specifically adjust checks in is_no_basin, pack_header, and any asserts/examples
that compared BASIN_NONE to MAX_BASIN_IDX or assumed 4095 was a valid encodable
basin. Ensure documentation/comments reflect the tightened valid range.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: cc9f0222-a5ee-45ba-8e39-fba9c0741545
📒 Files selected for processing (2)
src/hpc/codec/mode.rssrc/hpc/codec/predict.rs
722-line knowledge doc capturing the architectural claim that PR-X12's primitives (LeafCu, pack_leaf, predict_intra, Ctu, eventual rANS) are not video-codec-specific — they serve four loads on a single predictive-coder substrate: 1. Cognitive cells (the original target) 2. 3D Gaussian Splat coefficient compression 3. Transformer attention sparsification 4. Distributed-SGD gradient streaming Sections: - The 4-load isomorphism (§ 2) and 16-state (CellMode × MergeDir) classification table across all four domains - Primitive → load mapping matrix (§ 3) with file:line refs - 10 cross-domain epiphanies (§ 4) — including MergeDir-as-topology, predict_intra-as-attention-sparsifier, escape_next-as-allreduce-slot, mode-coded LoRA - 6 integration plans (§ 5) with worker × week + dependency graph - 8 exploration paths (§ 6) for sprint-scale research - 6 holy grail outcomes (§ 7) — sub-1-bit/Gaussian 3DGS, bit-exact attention, federated SGD at 8-16×, Lance substrate identity - 10 codec-side debts (§ 8) and 13 stack-side debts (§ 9) - Sequencing summary (§ 10): critical path = A7 rANS Companion to pr-x12-codec-x265-design.md (the mechanical design); a "See also" breadcrumb added there points here.
Lands `.claude/knowledge/pr-x12-codec-cognitive-substrate-mapping.md` (~576 lines) as the survives-compaction companion to the as-shipped `pr-x12-codec-x265-design.md`. Generalises the codec spec across the four industries it implicitly unifies (HEVC video, Gaussian splat, cognitive shaders / sparse attention, gradient compression for LLM training). # Why this doc Triggered by the PR #195 review session. PR-X12's mode taxonomy (Skip/Merge/Delta/Escape) shows up under different names in four disconnected research communities; each treats its own corner as the central knob (HEVC RDO, ZeRO buckets, splat sparsity reg, attention pruning). Without explicit naming of the unification, future agents will rediscover each corner independently and reimplement what the codec already provides. # Structure (citable by section number) - §0 — the big claim (PR-X12 is the gradient-quantisation substrate GenAI training has been missing for two years) - §1 — four-axis mapping table (x265 / splat / cognitive / gradient) - §2-§7 — deep mappings (mode taxonomy, CTU quad-tree, palette/basin codebook, transform basis, rANS, λ-RDO) - §8 — 15 numbered epiphanies (E-1..E-15) - §9 — 7 holy grail claims (H-1..H-7) - §10 — integration plan per sub-card (A4/A6/A7/A8) + 3 new PRs (splat, cognitive, gradient consumers) - §11 — exploration paths ranked by confidence (15 entries) - §12 — technical debt inventory (23 items T-1..T-23 across codec, ndarray substrate, lance-graph cognitive, cross-repo, PR #195) - §13 — 6 blasgraph/MKL synergies the HEVC team couldn't reach in 2013 (block-matched ME, batched DCT, partition tree as tropical- GEMM, CABAC replacement, learned deblocking, k-means at frame rate) - §14 — cross-references (design docs, rules, code paths) - §15 — how to use this doc (read order per use case) # Holy grail claims (load-bearing, citable) - H-1: PR-X12 + cam_pq is the HEVC SCC codec 2013 hardware couldn't build (4096-entry codebook at 60 fps) - H-2: The transform IS the optimiser (DCT-II ↔ Adam ↔ KFAC ↔ learned conv all share `Δ' = B·Δ`) — most underrated mapping - H-3: CTU quad-tree is the universal hierarchical-attention substrate (Mistral sliding window is depth-3 leaf processing) - H-4: rANS + k-means achieves Shannon-optimal lossless gradient compression (every published scheme is a special case) - H-5: PR-X12 generalises ZeRO (Merge captures parameter sharing that ZeRO's 3-bucket scheme can't) - H-6: 64×64 CTU is the right unit for both 4K video and 7B LLMs - H-7: The codec is the substrate; everything else is a renaming # Technical debt (23 items) Severity gradient P0/P1/P2/P3. Notable open items: - T-1, T-2: PR #195 CodeRabbit findings (BASIN_NONE collision + unwrap_or non-bijection) — actionable today - T-10..T-15: ndarray substrate (HPC graduation incomplete, no `Transform` trait yet, NEON tile-GEMM stub) - T-20..T-23: cross-repo coordination (cognitive-substrate- convergence-v1.md cross-ref, causal-edge v2 metadata wiring) # Board hygiene AGENT_LOG.md prepended with the session entry per CLAUDE.md rule "every PR adding a finding/agent run must PREPEND an AGENT_LOG entry in the same commit". Entry cites the doc structure, the 15 epiphanies + 7 holy grails by ID, and the 23 debt items. # No code changes Pure architecture / knowledge artifact. `cargo check --lib` clean (nothing to recheck). https://claude.ai/code/session_01HbqooFZHAjaUtFEzhA1R2u
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.claude/knowledge/pr-x12-cross-domain-synergies.md:
- Line 682: The fenced code block that begins with the ASCII diagram containing
"Plan A (rANS A7)" is missing a language tag and triggers markdownlint MD040;
update the opening fence (the triple backticks that precede the diagram) to
include a language tag such as text (i.e., change ``` to ```text) so the block
reads as a language-tagged fenced code block and keep the closing fence as-is;
locate the block by searching for the triple backticks immediately before the
"Plan A (rANS A7)" diagram.
- Line 66: The fenced code block at line 66 is missing a language tag, causing
MD040; update the opening fence for that block (the triple-backtick before the
ASCII diagram) to include a language identifier such as "text" (e.g., change ```
to ```text) so the block is properly tagged and the linter warning is resolved.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 4c69f065-7699-4cde-bc00-528fbdba23af
📒 Files selected for processing (2)
.claude/knowledge/pr-x12-codec-x265-design.md.claude/knowledge/pr-x12-cross-domain-synergies.md
✅ Files skipped from review due to trivial changes (1)
- .claude/knowledge/pr-x12-codec-x265-design.md
|
|
||
| All four loads share the same predictive-coding skeleton: | ||
|
|
||
| ``` |
There was a problem hiding this comment.
Add a language tag to the fenced block at Line 66.
This fence is missing a language identifier, which triggers MD040.
💡 Suggested fix
-```
+```text
┌──────────────────────────────────────┐
│ 1. Build basin codebook (offline or │
│ online k-means on the carrier) │
@@
-```
+```🧰 Tools
🪛 markdownlint-cli2 (0.22.1)
[warning] 66-66: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.claude/knowledge/pr-x12-cross-domain-synergies.md at line 66, The fenced
code block at line 66 is missing a language tag, causing MD040; update the
opening fence for that block (the triple-backtick before the ASCII diagram) to
include a language identifier such as "text" (e.g., change ``` to ```text) so
the block is properly tagged and the linter warning is resolved.
|
|
||
| If we commit to all of this, the order matters: | ||
|
|
||
| ``` |
There was a problem hiding this comment.
Add a language tag to the fenced block at Line 682.
Same markdownlint issue (MD040) here; add a language to keep docs lint-clean.
💡 Suggested fix
-```
+```text
Plan A (rANS A7)
│
▼
@@
-```
+```🧰 Tools
🪛 markdownlint-cli2 (0.22.1)
[warning] 682-682: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.claude/knowledge/pr-x12-cross-domain-synergies.md at line 682, The fenced
code block that begins with the ASCII diagram containing "Plan A (rANS A7)" is
missing a language tag and triggers markdownlint MD040; update the opening fence
(the triple backticks that precede the diagram) to include a language tag such
as text (i.e., change ``` to ```text) so the block reads as a language-tagged
fenced code block and keep the closing fence as-is; locate the block by
searching for the triple backticks immediately before the "Plan A (rANS A7)"
diagram.
…ASIN_NONE distinctness Two real bugs flagged by CodeRabbit's outside-diff review on b44fe59. P1 — `pack_leaf` was non-bijective via unwrap_or fallbacks (mode.rs:194): Hand-constructing a `LeafCu { mode: Merge, merge_dir: None, .. }` serialized as `LeafCu::merge(.., MergeDir::North)` — silently rewriting malformed input into a different valid leaf and hiding upstream invariant breaks. Now uses `?` to short-circuit with None when required fields are missing for the mode (Merge needs merge_dir; Delta needs delta; Escape needs escape_idx). The `LeafCu::merge/delta/escape` constructors still enforce the invariants — only struct-literal bypass callers can hit the rejection. + 3 regression tests: leaf_pack_rejects_malformed_merge_without_dir, leaf_pack_rejects_malformed_delta_without_value, leaf_pack_rejects_malformed_escape_without_idx + pack_leaf docstring documents the bijective contract. P1 — `BASIN_NONE` collided with `MAX_BASIN_IDX` at 4095 (mode.rs:69): Both equaled `(1 << 12) - 1`, so basin 4095 was ambiguous — a real basin and the "no basin" sentinel had the same encoded value. Now: MAX_BASIN_IDX = 4094 (highest real basin), BASIN_NONE = 4095 (sentinel one slot above). Introduced private BASIN_FIELD_MASK = 0x0FFF for header packing/unpacking — independent of MAX_BASIN_IDX so BASIN_NONE still round-trips through the 12-bit field as a sentinel marker. pack_header/unpack_header switched from `& MAX_BASIN_IDX` (would wrongly clear bit 0 with 0xFFE) to `& BASIN_FIELD_MASK`. + 2 regression tests: basin_none_distinct_from_max_basin_idx, header_round_trips_max_basin_idx_and_basin_none_distinctly + MAX_BASIN_IDX + BASIN_NONE doctests updated to assert the new relationship. Gates: cargo test --features codec --lib hpc::codec → 55 passed (+5) cargo test --features codec --doc hpc::codec → 15 passed cargo fmt --all -- --check → clean cargo clippy --features codec --lib -- -D warnings → clean
|
Follow-up review on P1 — P1 — Gates:
Generated by Claude Code |
621-line synthesis doc consolidating two parallel architecture threads on PR-X12's predictive-coder substrate: - Session A: pr-x12-codec-cognitive-substrate-mapping.md (cognitive shaders ↔ gradient ↔ HEVC mapping, this branch) - Session B: pr-x12-cross-domain-synergies.md (x265 ⇄ 3DGS ⇄ attention ⇄ SGD, merged to master via PR #195 commit 01c77cc) Both originals retained for archeology; merged-canon supersedes them for new content. Structure: - §1 overlap matrix (epiphanies, holy grails, plans, debt — unique vs shared between A and B) - §2 architectural claim with two-route falsifiability - §3 ten merge-only epiphanies (M:E-A..M:E-J) — the high-density section: universal LinearReduce<Basis> kernel, Morton+raster as space-filling curve, PR #195 CodeRabbit findings generalise to all four domains, ndarray-codec as 5th architectural category, three domain-specific plug-points only - §4 unified holy grail list (A:H + B:HG + M:H + 2 new M:H-NEW) - §5 unified plan with resolved critical path (B's A7-first wins, front-load A4-design) - §6 sequencing DAG - §7 unified technical debt (P0/P1/P2/P3 reranked across both docs) - §8 resolves 4 disagreements between A and B (each with winner + rationale) - §9 read order for the 4-doc family - §10 compaction-preservation contract (8 invariants) Key merge-only outputs: - Three load-bearing traits: PredictiveSignal, LinearReduce<Basis>, CurveOrder<const N> - M:H-NEW-1: single Rust binary consumes video|splat|gradient|kv-cache → compressed Lance column (the falsifiability test, Plan G builds it) - M:H-NEW-2: ~1.5 KLoC generic + ~200 LoC per domain = ~2 KLoC for four industries (25× code-density vs per-domain impls) Three new integration plans: - Plan G: multi-domain bench harness (2 weeks, MUST ship before A7) - Plan H: extract `ndarray-codec` crate (3 days, resolves dep-cycle) - Plan I: trait PredictiveSignal in protocol (3 days, bit-pattern isomorphism with splat without code merge) Citation IDs preserved from both originals: cite as (A:E-N), (B:E-N), (M:E-X). No code changes; pure synthesis. Cargo check clean.
Two CodeRabbit MD040/MD027 findings + one Codex P2 stale-fact finding, all verified valid against current code. CodeRabbit (markdownlint): - pr-x12-codec-cognitive-substrate-mapping.md: 4 untyped code fences → ```text (Skip/Merge/Delta block at line 56; depth-0..3 quad-tree blocks for spatial/attention/gradient hierarchies at lines 80/91/104) - pr-x12-substrate-merged-canon.md: collapse `> - ` to `> - ` (lines 6-7, MD027 multi-space-after-blockquote), 2 untyped fences → ```text (architecture-rule list at line 165, sequencing diagram at line 411), and "delta IS" → "delta is" stylistic Codex P2 (stale doc vs shipped code): - Doc cited `MAX_BASIN_IDX = 4095` and framed BASIN_NONE collision as "pending / not yet merged / not yet pushed" — but PR #195 commit 2423298 already shipped `MAX_BASIN_IDX = 4094` with `BASIN_NONE = 4095` reserved sentinel, plus bijective `pack_leaf` via `?` operator with 3 regression tests. Updated: - §4.1 line 125: `MAX_BASIN_IDX = 4094` + sentinel-range explainer - §4.3 heading + body: "(resolved in PR #195)" with commit cite - §"Still open on PR #195" block: → "Resolved in PR #195 follow-up" - §12 debt-inventory T-1, T-2: marked ~~RESOLVED~~ with commit ref No content edits beyond what the findings asked for; citation IDs unchanged.
… codebook PR #195's follow-up commit 2423298 resolved the BASIN_NONE/MAX_BASIN_IDX collision by shrinking MAX_BASIN_IDX to 4094 with BASIN_NONE = 4095 as a distinct sentinel. The shrink was the wrong fix — the sentinel itself was the bug. Reference (the authoritative source the code drifted from): src/hpc/ogit_bridge/assets/cognitive/entities/Leaf.ttl — the HHTL ontology defines the codebook as 16 Hips × 16 Twigs × 16 Leaves = 4096 Leaves per Heel, every Leaf carrying a real basinSignature. No slot is reserved for absence; the ontology forbids "no basin". What BASIN_NONE actually was: authoring-time epistemic uncertainty ("encoder hasn't decided yet for this cell") smuggled into wire-format ontological uncertainty ("this cell exists but has no basin"). A category error — Rust's idiom would have surfaced this as Option<u16> in the encoder's transient scratch state and a plain u16 on the persistent record. The sentinel value bypassed that hygiene by hiding optionality inside a magic u16, invisible to the typechecker. Confirmation that the sentinel was load-bearing for nothing: full audit of BASIN_NONE producers across src/ found zero — no LeafCu::skip(...), no basin_idx: BASIN_NONE anywhere. The only references were the const definition, a re-export, one regression test that constructed it, and predict.rs:is_no_basin (a predicate with no callers that emit BASIN_NONE leaves). Defensive infrastructure for a feature the ontology forbids and the implementation never used. Changes: - mode.rs: MAX_BASIN_IDX = (1 << 12) - 1 = 4095 (full 12-bit range, 4096 real basins). BASIN_NONE deleted. The 2 sentinel-distinctness regression tests replaced with one test asserting the full range. Docs reframed around the HHTL ontology + the Option<u16> scratch-state contract. - predict.rs: is_no_basin deleted; its only doctest+test deleted. BASIN_NONE import dropped. - mod.rs: BASIN_NONE and is_no_basin removed from re-exports. - pr-x12-codec-cognitive-substrate-mapping.md §4.1/§4.3/§10.1/§12: framing updated. §4.3 now records the full arc (collision → shrink → delete) as design archeology rather than as an open issue. Validation: - cargo check --features codec --lib → clean - cargo clippy --features codec --lib -- -D warnings → clean - cargo test --features codec --lib hpc::codec → 54 tests pass - cargo test --features codec --doc hpc::codec → 13 doctests pass Net diff: -77, +38 lines. Pure subtraction modulo doc reframe.
Summary
Two codec sub-cards on top of PR-X12 A1 (
Ctu/LeafCu):A2 —
src/hpc/codec/mode.rs(~270 lines, 11 tests)basin_idxMergeDir2-bit pack/unpack (high bits masked)packed_byte_len()const fn for buffer pre-sizingMAX_BASIN_IDX(4095) +BASIN_NONEsentinelA3-intra —
src/hpc/codec/predict.rs(~330 lines, 12 tests)IntraContext { basin_idx, delta_i32, [Option<&LeafCu>; 4] NESW }IntraConfig { escape_next_idx: Option<u32> }predict_intra()decision tree: Skip → Merge → Delta → Escape(monotone wire cost 2 → 3 → 3 → 6 bytes; cheapest-fit policy)
basin_idx+ same δ asu8(sign-tolerant wrapping cast)Deferred to follow-up: A3-inter (cross-tier neighbour scan from
BlockedGridL2/L3), A4 transform, A6 RDO, A7 rANS entropy coder, A8 stream framing.Test plan
cargo check --features codec --lib— cleancargo test --features codec --lib hpc::codec— 45 passed (22 ctu + 11 mode + 12 predict)cargo fmt --all -- --check— cleancargo clippy --features codec --lib -- -D warnings— cleanResultand forcing callers to handleGenerated by Claude Code
Summary by CodeRabbit
New Features
Tests
Documentation